Journal: bioRxiv
Article Title: HyDrop v2: Scalable atlas construction for training sequence-to-function models
doi: 10.1101/2025.04.02.646792
Figure Lengend Snippet: a, t -distributed stochastic neighbor embedding (tSNE) of all 607,330 cells generated with HyDrop v2 (340,604, 18 experiments) and 10x v2 (266,736 cells, 5 experiments) colored by technique, batch correction according to wet lab protocol used to generate data. Data points were randomly shuffled before plotting. b, t -distributed stochastic neighbor embedding (tSNE) of 61,480 cells of 16-20h AEL extracted from sciATAC embryo atlas (Calderon et al., 2022) colored by cell type. Data points were randomly shuffled before plotting. c, Fragment size plot of HyDrop v2, 10x v2 and sciATAC-seq3 experiments. d, Genome tracks of cell-type specific DARs down sampled to the smallest count present for each technique, 1,247 cells in glia and 1,002 cells in hemocytes. The shown genome tracks are normalized for the fragment count. d, Carrot plot showing the regions of accessible chromatin ±0.5 kb around the center of DARs sorted by the highest (blue) to lowest (red) accessibility for glia and hemocytes for HyDrop v2, 10x v2, and sciATAC data. Per cell type, the left plots are generated using the coverage while the right plots are generated using the cut site information extracted from cell-type specific fragment files. e, Drosophila-specific Tn5 bias from Hu et al., (2025) plotted on DARs of neuronal cells, somatic muscle cells, glia, and hemocytes. f, Sequencing efficiency compared between HyDrop v2 and 10x v2 samples. The colors are shown in a, the labels are shown for the top three motifs per cell type. g, Scatterplot of normalized enrichment score of cell-type-specific transcription motifs, dots represent a common motif between sciATAC and 10x v2 data colored per cell type as shown in b. h, Heatmap for top 1000 DARs Yolk cells of 10x v2, HyDrop v2, and sciATAC data. DAR: differentially accessible region.
Article Snippet: As Tn5 insertion inputs, we used the Hydrop v2 and 10x v2 fragment files preprocessed by PUMATAC pipeline for in-house generated data and cellranger-arc (10x Genomics) for public the mouse multiome cortex data .
Techniques: Generated, Sequencing